An Efficient Storm Identification from Big Rainfall Data Using MapReduce

نویسندگان

Kulsawasd Jitkajornwanich

Ramez Elmasri

Chengkai Li

چکیده

This paper is part of my doctoral dissertation, “Analysis and Modeling Techniques for Geo-Spatial Datasets,” which focuses on how to summarize, model, and format spatiotemporal data for analysis and mining. The dissertation consists of four main components: (1) spatio-temporal knowledge representation, (2) identifying meaningful concepts from raw data, (3) converting raw data to conceptual data, and (4) analysis and mining of conceptual data. This paper, part of the third component, describes an efficient MapReduce algorithm for converting raw rainfall data into meaningful storm information, which can then be easily analyzed and mined. Our previous work proposed a method to identify relevant storm characteristics from raw rainfall data. The original storm identification system takes too long to produce the summarized storm characteristics, because: (1) the raw rainfall data, which is considered as big data, is stored in a traditional relational database based on CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.) ODM (Observations Data Model), which leads to substantial disk I/O; (2) the storm identification algorithm is based on recursion and regular depth-first-search (DFS), which leads to multiple retrievals for parts of the data. In this paper, we obtain a substantial improvement in performance by utilizing MapReduce. We also utilize the original raw rainfall data text files instead of using the data in the relational database. In our experiments, the performance of the new storm identification system is significantly improved compared to the previous one. With this new system, it will dramatically benefit hydrologists in helping them performing rainfallrelated analysis (both location-specific and storm-specific) such as flood prediction using our identified storms. Keywords-storm analysis; rainfall; big data; MapReduce; distributed computing; CUAHSI

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

Abstract: In a day-to-day life, the capacity of data increased enormously with time. The growth of data which will be unmanageable in social networking sites like Facebook, Twitter. In the past two years the data flow can increase in zettabyte. To handle big data there are number of applications has been developed. However, analyzing big data is a very challenging task today. Big Data refers to...

متن کامل

A Novel Approach for Identification of Hadoop Cloud Temporal Patterns Using Map Reduce

− Due to the latest developments in the area of science and Technology resulted in the developments of efficient data transfer, capability of handling huge data and the retrieval of data efficiently. Since the data that is stored is increasing voluminously, methods to retrieve relative information and security related concerns are to be addressed efficiently to secure this bulk data. Also with ...

متن کامل

Collaborative Filtering Recommendation using Matrix Factorization: A MapReduce Implementation

Matrix Factorization based Collaborative Filtering (MFCF) has been an efficient method for recommendation. However, recent years have witness the explosive increasing of big data, which contributes to the huge size of users and items in recommender systems. To deal with the efficiency of MFCF recommendation in the context of big data challenge, we propose to leverage MapReduce programming model...

متن کامل

Efficient Entity Maching over Multiple Data Sources with MapReduce

The execution of data-intensive tasks such as entity matching on large data sources has become a common demand in the era of Big Data. To face this challenge, cloud computing has proven to be a powerful ally to efficient parallel the execution of such tasks. In this work we investigate how to efficiently perform entity matching over multiple large data sources using the MapReduce programming mo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

An Efficient Storm Identification from Big Rainfall Data Using MapReduce

نویسندگان

چکیده

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

A Novel Approach for Identification of Hadoop Cloud Temporal Patterns Using Map Reduce

Collaborative Filtering Recommendation using Matrix Factorization: A MapReduce Implementation

Efficient Entity Maching over Multiple Data Sources with MapReduce

عنوان ژورنال:

اشتراک گذاری